found it:
Long Mode Segment Limit Enable bit (LMSLE bit 13)
found it:
Long Mode Segment Limit Enable bit (LMSLE bit 13)
Hi,
I am writing my own OS for 64bit processors and I am stuck with the problem of general protection. My OS will not rely on page fault to implement user space protection mechanism, so I found there is a way to do it with segment limit checking:
This presentation from VMWare
http://download3.vmware.com/vmworld/2005/pac346.pdf
on page 20 says:
> - Initial AMD64 architecture did not include segmentation in 64-bit mode
> - Segmentation also missing from EMT64T
>
> How do we protect the VMM ?
> - 64-bit guest support requires additional hardware assitance
> - Segment limit checks available in 64-bit mode on newer AMD processors
Now, I have the newer AMD processor model and my question is how do I achieve limit segment limit check on AMD processor in 64-bit (long) mode ? I have downloaded the Sep 2011 version (lastest) of developer's manual and I can't find how to do this in any place, please help.
Hi,
is there a way to do 128 bit comparison in one instruction with SSE 4.2a?
I need to compare if the XMM0 register is 0xFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
If not, what would be the fastest way to compare with two steps of 64bit comparisions?
Thanks in advance.
you are a big help, it was exactly what I was looking for!
thanks
Hi,
I am looking for instruction cost list, in clock cycles, for ia-32 intel architecture, something like this:
movdqa <-- 3 clock cycles
mov <-- 1 clock cycle
jmp <-- 1 clock cycle
Is this info available somewhere? I checked the manuals, didn't find it.
Will appreciate any help
Thanks!
Here is a wonderful resource to help you understand the proper use of FPU instructions:
http://webster.cs.ucr.edu/AoA/Windows/HTML/RealArithmetic.html#998833
ohh i see. so this relic can't move from general purpose reg to its stack because the stack is 64 bit long (well, actualy 80 bits long) and this thing was made in late 1970s , wow!
I hope that now all the micros are 64 bits, Intel reconsiders and adds direct move instructions to the stack , because if the FPU is good for something, it is for storage, almost 512 bits there on the regs, And regarding the calcs, you do it on XMMs, much faster
Hi,
i have got a new question, is it possible to pop from the FPU stack and deposit to general purpose register without doing FIST to memory?
Or in other words,, is there any way to do
FISTP RAX
or something equivalent? (i can't write to memory because of low speed)
Hi,
how many 64 bit registers can I use inside intel i7 cpu for storage purposes to feed them later into XMM registers? I currently use XMM0-15, MM0-8, R8-15 only. I know i can use RAX,RBX,RCX, RDX and eight registers inside the FPU (ST0-ST8), but what others can I use? Can I use stack registers? Thanks in advance.
I attach my application code if needed.
///////////////////////////////////////////
pipe_line_math.h
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
void pipe_mult_ushort(ushort *data,ushort *rands)
{
__asm__ __volatile__(".intel_syntax noprefix\n\t"
//// in this section we pull as much data as we can into the CPU
//// to minimize the DRAM delay and store it where we can
"movdqa xmm0,[edi]\n\t" // load xmm0 & xmm1
"movdqa xmm1,[esi]\n\t"
"movdqa xmm2,0x10[edi]\n\t" // load xmm2 & xmm3
"movdqa xmm3,0x10[esi]\n\t"
"movdqa xmm4,0x20[edi]\n\t" // load xmm4 & xmm5
"movdqa xmm5,0x20[esi]\n\t"
"movdqa xmm6,0x30[edi]\n\t" // load xmm6 & xmm7
"movdqa xmm7,0x30[esi]\n\t"
"movdqa xmm8,0x40[edi]\n\t" // load xmm8 & xmm9
"movdqa xmm9,0x40[esi]\n\t"
"movdqa xmm10,0x50[edi]\n\t" // load xmm10 & xmm11
"movdqa xmm11,0x50[esi]\n\t"
"movdqa xmm12,0x60[edi]\n\t" // load xmm12 & xmm13
"movdqa xmm13,0x60[esi]\n\t"
"movdqa xmm14,0x70[edi]\n\t" // load xmm14 & xmm15
"movdqa xmm15,0x70[esi]\n\t"
"movq mm0,0x80[edi]\n\t" // load mmx0
"movq mm1,0x80[esi]\n\t"
"movq mm2,0x88[edi]\n\t"
"movq mm3,0x88[esi]\n\t"
"movq mm4,0x90[edi]\n\t"
"movq mm5,0x90[esi]\n\t"
"movq mm6,0x98[edi]\n\t"
"movq mm7,0x98[esi]\n\t"
"movq r8,0xA0[edi]\n\t" // store some in extended 64bit registers
"movq r9,0xA0[esi]\n\t"
"movq r10,0xA8[edi]\n\t"
"movq r11,0xA8[esi]\n\t"
"movq r12,0xB0[edi]\n\t"
"movq r13,0xB0[esi]\n\t"
"movq r14,0xB8[edi]\n\t"
"movq r15,0xB8[esi]\n\t"
// all available registers were data can be stored were filled, proceed with calcs now
// calc xmms first
"pmullw xmm0,xmm1\n\t" // calc xmm0
"pmullw xmm2,xmm3\n\t" …
Well, clock cycles certainly do not equate to execution time on modern processors. But, if you still think you need to know this information, then go directly to the source:
i know this , and this is what i am asking, so i could align my code for optimum pipelining.
http://www.intel.com/design/corei7/documentation.htm
http://www.intel.com/products/processor/manuals/index.htmThose are free downloads!
have them all, nothing helpful there,
no more than do dirty work timing each one, i guess
Hi,
i have been looking for this and can't find anywhere about the subject. I would like to know how many clock cycles does it takes to execute each instruction in complete asm instruction set for Intel i7 microprocessor. (including SSE & FPU instructions). Does any one has a list of this values ?
Thanks in advance
Nulik